Value Iteration Working With Belief Subsets
نویسندگان
چکیده
Value iteration is a popular algorithm for solving POMDPs. However, it is inefficient in practice. The primary reason is that it needs to conduct value updates for all the belief states in the (continuous) belief space. In this paper, we study value iteration working with a subset of the belief space, i.e., it conducts value updates only for belief states in the subset. We present a way to select belief subset and describe an algorithm to conduct value iteration over the selected subset. The algorithm is attractive in that it works with belief subset but also retains the quality of the generated values. Given a POMDP, we show how to a priori determine whether the selected subset is a proper subset of belief space. If this is the case, the algorithm carries the advantages of representation in space and efficiency in time.
منابع مشابه
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-...
متن کاملImpact of reconstruction method on quantitative parameters of 99mTc-TRODAT-1 SPECT
Introduction: Quantitative evaluation is recommended to improve diagnostic ability and serial assessment of dopamine transporter (DAT) density scans. We decided to compare the ordered subsets expectation-maximization (OSEM) with filtered back-projection (FBP), and to investigate the impact of different iteration and cut-off frequencies on SBR values. Methods</stro...
متن کاملValue Iteration over Belief Subspace
Partially Observable Markov Decision Processes (POMDPs) provide an elegant framework for AI planning tasks with uncertainties. Value iteration is a well-known algorithm for solving POMDPs. It is notoriously difficult because at each step it needs to account for every belief state in a continuous space. In this paper, we show that value iteration can be conducted over a subset of belief space. T...
متن کاملIncremental Least Squares Policy Iteration for POMDPs
We present a new algorithm, incremental least squares policy iteration (ILSPI), for finding the infinite-horizon policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the value function by minimizing the Bellman residual and it performs policy improvement in reachable belief states. A number of optimal basis functions are dete...
متن کاملPoint-based value iteration: An anytime algorithm for POMDPs
This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyperplane per point, it is able to successfully solve large problems, incl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002